After looking through the data, I decided to use 3 different metrics to try and find low paying and high performing players. I believe that the most important traits of a player include the number of points they score (PTS), the number of blocks they can execute (BLK) and their effective field goal percentage (eFG.). This is because points will help the team win the game, while blocks will prevent the enemy team from scoring too much. eFG. is there to help ensure that our players have a certain level of accuracy, which is desired among basketball players.

I first started by categorizing players based on their salary. I wanted to create a categorical variable that indicated whether the player was in the “low pay”, “middle pay” or “high pay” range of salary. I decided to use 3 clusters for the salary after much experimentation. The results of my experimentation are in the bar graph below, which shows that there is a decent amount of players in each category.

With the players categorized by their salary, I tried to discover how many clusters would be needed to categorize the performance variables (PTS, BLK and .eFG). I used this function to try and find how much the clustering explained the variance.

## Evaluate several different number of clusters
explained_variance = function(data_in, k){
  ### Running the kmeans algorithm.
  set.seed(1)
  kmeans_obj = kmeans(data_in, centers = k, algorithm = "Lloyd", iter.max = 30)
  
  ### Variance accounted for by clusters:
  ### var_exp = intercluster variance / total variance
  var_exp = kmeans_obj$betweenss / kmeans_obj$totss
  var_exp  
}

I plotted the variance explained out to use the elbow method, it looks like 3 clusters is a good choice.

I them proceeded to create three clusters for performance and overlay the pay range clusters (low, middle and high). I got some promising results below. Here, we can see that there are some players in the “low pay” range who perform quite well in certain areas. The withinSS I got was 88.9%.

However, these 2D graphs will not be enough for us to find good players. A 3D graph will allow us to see the clusters in their entirety. This interactive model is shown below.

As we can see, there are several good candidates for selection.

Brad Wanamaker is a good choice because he has a high eFG. and his salary is in the “low pay” cluster. There are no players who have a higher eFG. than Wanamaker, even considering the highly paid players. This makes him a very good choice.

Gary Clark is a good choice because he scores lots of points and can block a lot of offenses while being in the “low pay” cluster. In addition, his eFG. are not too bad (greater than 0). Compared to the other “underpaid” NBA players that score a lot of points, Gary Clark has an additional advantage because of is relatively high BLK score.

Solomon Hill is a good choice because he has a lot of blocks and he scores decently (eFG. > 0 and PTS > 0.75). He massively out performs the other players in the area of blocks, as you can see in the chart. He even outperforms highly paid players in that regard.

Overall, I beleive these players are under paid but highly performant. Using the power of data science, I was able to find players that had low pay, but performed well in the different aspects. I think it’s a good idea to have 3 players that are really good at blocking, gaining points and scoring. If they work together, they can be good at both offense and defence. Therefore, I highly recommend these players for our team.